[KV Connector] Make KVCacheConfig an explicit constructor argument#27887
[KV Connector] Make KVCacheConfig an explicit constructor argument#27887KuntaiDu merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors KV connector instantiation to explicitly pass KVCacheConfig, which is a good improvement for separating runtime state from global configuration. The implementation of backward compatibility for external connectors is also well-handled in the factory. However, I've identified a critical issue in MultiConnector where it incorrectly instantiates its sub-connectors, which breaks backward compatibility and fails to pass the kv_cache_config. I've also found a minor issue in a test utility. Please see my comments for details and suggestions.
|
See also #25712 (comment) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
xref #27811 |
4513b7c to
15ec91b
Compare
Follow on from vllm-project#25712 `VllmConfig` is explicitly designed as a dataclass containing user-provided configuration and model metadata. It is a global configuration object that lives throughout the entire engine lifetime and is meant to be immutable after `__post_init__()`. `KVCacheConfig` is worker-specific, runtime-computed state. It has limited lifetime, and its purpose is limited to initializing the KV Cache in the model runner. Even if we add KV cache hints to model config.json in future, this would be parsed into `ModelConfig`, used as input to the `get_kv_cache_configs()` computation, and the resulting `KVCacheConfig` would still be runtime state. We are currently creating per-worker copies of VllmConfig in order to attach the runtime `KVCacheConfig` state. But instead we should just explicitly pass `KVCacheConfig` to the connector. Make sure to handle backwards compatibility for external connector implementations (loaded via module path) that have the old style constructor signature. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
15ec91b to
fdc30ae
Compare
KuntaiDu
left a comment
There was a problem hiding this comment.
In general LGTM. Some nits listed in comments.
| f"Class {connector_name} not found in {connector_module_path}" | ||
| ) from e | ||
| connector_cls = cast(type[KVConnectorBaseType], connector_cls) | ||
| if not supports_kw(connector_cls, "kv_cache_config"): |
There was a problem hiding this comment.
Just to confirm: this means that we allow connector to include kv_cache_config field as init args even when it does not support hybrid allocator (not a subclass of SupportsHMA)?
There was a problem hiding this comment.
Yeah. Since we we are currently unconditionally attaching KVCacheConfig to VllmConfig, I didn't even consider making it conditional until now
If we think that KVCacheConfig will only be used as part of implementing request_finished_all_groups() we could add init_hma() or init_kv_cache_config() to SupportsHMA ?
There was a problem hiding this comment.
KVCacheConfig takes effect on almost all connector methods when HMA is enabled and it is not useful at all when HMA is disabled.
That said, the current implementation looks good to me as it is simple enough.
| @@ -0,0 +1,275 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Maybe rename this file? Like test_connector_init_with_kv_cache_config or something.
There was a problem hiding this comment.
Obviously I don't really mind renaming it, if it helps, but my thinking is that these tests are about testing support for connectors that have not yet been updated to the new signature so it's more like "without_kv_cache_config()"
Basically, because all connectors are expected to support the new signature, we'll soon see these tests as old cruft that we need to keep around for a while
(This is different from the SupportsHMA approach - in that case, maybe only a small subset of connectors would be updated to take KVCacheConfig)
|
Just capturing some of the points from the Slack discussion, since I think it was very useful 👍
|
NickLucche
left a comment
There was a problem hiding this comment.
lgtm, let's keep compat code under control though
| self, | ||
| vllm_config: "VllmConfig", | ||
| role: KVConnectorRole, | ||
| kv_cache_config: "KVCacheConfig", |
There was a problem hiding this comment.
@KuntaiDu do you need to change LMCache connector? kv_cache_config is not available in vllm config now.
There was a problem hiding this comment.
I need to change LMCache PR correspondingly, but we can merge this PR first and I can change it in LMCache afterwards.
heheda12345
left a comment
There was a problem hiding this comment.
LGTM! Leave this PR to @KuntaiDu for compatibility with LMCache + HMA
Thanks for reminder! I will fix LMCache compatibility after this PR is merged (still a lot of work on LMCache side before turning on HMA by default for LMCache). |
…llm-project#27887) Signed-off-by: Mark McLoughlin <markmc@redhat.com>
…llm-project#27887) Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Follow on from #25712
VllmConfigis explicitly designed as a dataclass containing user-provided configuration and model metadata. It is a global configuration object that lives throughout the entire engine lifetime and is meant to be immutable after__post_init__().KVCacheConfigis worker-specific, runtime-computed state. It has limited lifetime, and its purpose is limited to initializing the KV Cache in the model runner.Even if we add KV cache hints to model config.json in future, this would be parsed into
ModelConfig, used as input to theget_kv_cache_configs()computation, and the resultingKVCacheConfigwould still be runtime state.We are currently creating per-worker copies of VllmConfig in order to attach the runtime
KVCacheConfigstate. But instead we should just explicitly passKVCacheConfigto the connector.Make sure to handle backwards compatibility for external connector implementations (loaded via module path) that have the old style constructor signature.